AITopics | crowd counting

Embodied Crowd Counting

Neural Information Processing SystemsJun-23-2026, 08:21:06 GMT

Occlusion is one of the fundamental challenges in crowd counting. In the community, various data-driven approaches have been developed to address this issue, yet their effectiveness is limited. This is mainly because most existing crowd counting datasets on which the methods are trained are based on passive cameras, restricting their ability to fully sense the environment. Recently, embodied navigation methods have shown significant potential in precise object detection in interactive scenes. These methods incorporate active camera settings, holding promise in addressing the fundamental issues in crowd counting.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Embodied Crowd Counting

Neural Information Processing SystemsJun-18-2026, 03:19:58 GMT

Occlusion is one of the fundamental challenges in crowd counting. In the community, various data-driven approaches have been developed to address this issue, yet their effectiveness is limited. This is mainly because most existing crowd counting datasets on which the methods are trained are based on passive cameras, restricting their ability to fully sense the environment. Recently, embodied navigation methods have shown significant potential in precise object detection in interactive scenes. These methods incorporate active camera settings, holding promise in addressing the fundamental issues in crowd counting.

artificial intelligence, name change, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.78)

Add feedback

Incorporating Side Information by Adaptive Convolution

Neural Information Processing SystemsMar-17-2026, 18:22:38 GMT

Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information.

artificial intelligence, deep learning, machine learning, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

118bd558033a1016fcc82560c65cca5f-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 12:45:27 GMT

DM-Count reduced the error of the state-of-the-art published result by approximately16%.

artificial intelligence, dm-count, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Incorporating Side Information by Adaptive Convolution

Neural Information Processing SystemsNov-21-2025, 16:12:27 GMT

Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in counting systems based on deep learning. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information.

incorporating side information, name change, side information, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward

Wang, Zhiqiang, Feng, Pengbin, Lin, Yanbin, Cai, Shuzhang, Bian, Zongao, Yan, Jinghua, Zhu, Xingquan

arXiv.org Artificial IntelligenceNov-4-2025

CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward 1 st Zhiqiang Wang Florida Atlantic University Boca Raton, USA zwang2022@fau.edu 2 nd Pengbin Feng University of Southern California Los Angeles, USA fengpengbin.apply@gmail.com Abstract--We propose CrowdVLM-R1, which expands the R1 base model for accurate crowd counting, using a novel framework that integrates the fuzzy group relative policy optimization reward function (FGRPR) to enhance learning efficiency. Unlike the conventional binary (0/1) accuracy reward, our fuzzy reward model, FGRPR, which contains both format and precision rewards, provides nuanced incentives to encourage the R1 model to learn to adjust policies towards precise outputs. Supervised fine-tuning (SFT) is also integrated for the CrowdVLM-R1 model to learn from a handful of inputs to enable both in-domain and out-of-domain counting. Experimental results demonstrate that GRPO with a standard binary accuracy reward underperforms compared to SFT . In contrast, FGRPR, applied to Qwen2.5-VL-(3B/7B), surpasses all baseline models, including GPT -4o, LLaMA2-70B and SFT, in five domain datasets. For out-of-domain datasets, FGRPR achieves performance comparable to SFT but excels when target values are larger, as its fuzzy reward function assigns higher rewards to closer approximations. This approach is broadly applicable to tasks where the precision of the answer is critical. I. INTRODUCTION Recently, DeepSeek R1 [1] has drawn much attention among advances in large language models (LLMs), as it demonstrates how reinforcement learning (RL) can be the primary driver of reasoning.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2504.03724

Country: North America > United States > California > Los Angeles County > Los Angeles (0.54)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modeling Noisy Annotations for Crowd Counting

Neural Information Processing SystemsOct-2-2025, 11:16:34 GMT

The annotation noise in crowd counting is not modeled in traditional crowd counting algorithms based on crowd density maps.

artificial intelligence, density map, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distribution Matching for Crowd Counting Supplementary Material

Neural Information Processing SystemsOct-2-2025, 02:50:44 GMT

DM-Count and investigate the robustness of different methods to noisy annotations. Assume for all x D and g G we have |g ( x) | B . We propose the following five lemmas which are essential for proving the proposed theorems. Lemmas A, B, C and D give the Lipschitz constants of different loss functions. Consider the dual form of Eq. (15) W ( µ, ν) = max α The first inequality in Eq. (20) is achieved because The second equality in Eq. (20) is achieved because We restate Theorem 1 in the main paper below.

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distribution Matching for Crowd Counting Boyu Wang

Neural Information Processing SystemsOct-2-2025, 02:50:37 GMT

Instead, we propose to use Distribution Matching for crowd COUNTing (DM-Count).

artificial intelligence, machine learning, proceedings, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Count2Density: Crowd Density Estimation without Location-level Annotations

Litrico, Mattia, Chen, Feng, Pound, Michael, Tsaftaris, Sotirios A, Battiato, Sebastiano, Giuffrida, Mario Valerio

arXiv.org Artificial IntelligenceSep-4-2025

Crowd density estimation is a well-known computer vision task aimed at estimating the density distribution of people in an image. The main challenge in this domain is the reliance on fine-grained location-level annotations, (i.e. points placed on top of each individual) to train deep networks. Collecting such detailed annotations is both tedious, time-consuming, and poses a significant barrier to scalability for real-world applications. To alleviate this burden, we present Count2Density: a novel pipeline designed to predict meaningful density maps containing quantitative spatial information using only count-level annotations (i.e., total number of people) during training. To achieve this, Count2Density generates pseudo-density maps leveraging past predictions stored in a Historical Map Bank, thereby reducing confirmation bias. This bank is initialised using an unsupervised saliency estimator to provide an initial spatial prior and is iteratively updated with an EMA of predicted density maps. These pseudo-density maps are obtained by sampling locations from estimated crowd areas using a hypergeometric distribution, with the number of samplings determined by the count-level annotations. To further enhance the spatial awareness of the model, we add a self-supervised contrastive spatial regulariser to encourage similar feature representations within crowded regions while maximising dissimilarity with background regions. Experimental results demonstrate that our approach significantly outperforms cross-domain adaptation methods and achieves better results than recent state-of-the-art approaches in semi-supervised settings across several datasets. Additional analyses validate the effectiveness of each individual component of our pipeline, confirming the ability of Count2Density to effectively retrieve spatial information from count-level annotations and enabling accurate subregion counting.

annotation, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.0317

Country: Europe > United Kingdom (0.46)

Genre: